learner model
Curriculum Design for Teaching via Demonstrations: Theory and Applications
We consider the problem of teaching via demonstrations in sequential decisionmaking settings. In particular, we study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Maximum Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC). Our unified strategy induces a ranking over demonstrations based on a notion of difficulty scores computed w.r.t. the teacher's optimal policy and the learner's current policy. Compared to the state of the art, our strategy doesn't require access to the learner's internal dynamics and still enjoys similar convergence guarantees under mild technical conditions. Furthermore, we adapt our curriculum strategy to the setting where no teacher agent is present using task-specific difficulty scores. Experiments on a synthetic car driving environment and navigation-based environments demonstrate the effectiveness of our curriculum strategy.
Appendix
In practice, building f and g requires the computation for wtiwtj for all i,j. B.2 Classification For the classification task with the logistic regression model, we modify the formula of logistic regression in teaching objectives to make it convenient for derivation. It also indicates that with probability at least p1, the LST teacher can achieve exponential teachability in the iteration t. In order to achieve exponential teachiability in T iterations, the sufficient condition in Eq. (22) must be satisfied in all T iterations. Then, we use a pre-trained DenseNet [65] shown in [53] to generate 1024 dim features and the confidencescoreforeachimage.
Dynamic Jointly Batch Selection for Data Efficient Machine Translation Fine-Tuning
Ghanizadeh, Mohammad Amin, Dousti, Mohammad Javad
Data quality and its effective selection are fundamental to improving the performance of machine translation models, serving as cornerstones for achieving robust and reliable translation systems. This paper presents a data selection methodology specifically designed for fine-tuning machine translation systems, which leverages the synergy between a learner model and a pre-trained reference model to enhance overall training effectiveness. By defining a learnability score, our approach systematically evaluates the utility of data points for training, ensuring that only the most relevant and impactful examples contribute to the fine-tuning process. Furthermore, our method employs a batch selection strategy which considers interdependencies among data points, optimizing the efficiency of the training process while maintaining a focus on data relevance. Experiments on English to Persian and several other language pairs using an mBART model fine-tuned on the CCMatrix dataset demonstrate that our method can achieve up to a fivefold improvement in data efficiency compared to an iid baseline. Experimental results indicate that our approach improves computational efficiency by 24 when utilizing cached embeddings, as it requires fewer training data points. Additionally, it enhances generalization, resulting in superior translation performance compared to random selection method.
Reviews: Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints
This paper formalizes the problem of inverse reinforcement learning in which the learner's goal is not only to imitate the teacher's demonstration, but also to satisfy her own preferences and constraints. It analyzes the suboptimality of learner-agnostic teaching, where the teacher gives demonstrations without considering the learner's preferences. It then proposes a learner-aware teaching algorithm, where the teacher selects demonstrations while accounting for the learner's preferences. It considers different types of learner models with hard or soft preference constraints. It also develops learner-aware teaching methods for both cases where the teacher has full knowledge of the learner's constraints or does not know it.